AITopics

2506.1281

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.65)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

arXiv.org Artificial IntelligenceOct-28-2024

CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models

Gong, Zi, Yu, Hang, Liao, Cong, Liu, Bingchang, Chen, Chaoyu, Li, Jianguo

Multi-task learning (MTL) benefits the fine-tuning of large language models (LLMs) by providing a single model with improved performance and generalization ability across tasks, presenting a resource-efficient alternative to developing separate models for each task. Yet, existing MTL strategies for LLMs often fall short by either being computationally intensive or failing to ensure simultaneous task convergence. This paper presents CoBa, a new MTL approach designed to effectively manage task convergence balance with minimal computational overhead. Utilizing Relative Convergence Scores (RCS), Absolute Convergence Scores (ACS), and a Divergence Factor (DF), CoBa dynamically adjusts task weights during the training process, ensuring that the validation loss of all tasks progress towards convergence at an even pace while mitigating the issue of individual task divergence. The results of our experiments involving three disparate datasets underscore that this approach not only fosters equilibrium in task convergence but enhances the LLMs' performance by up to 13% relative to the second-best baselines. Code is open-sourced at https://github.com/codefuse-ai/MFTCoder.

large language model, machine learning, natural language, (19 more...)

2410.06741

Country: Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Haines, Nathaniel, Goold, Conor

BayesBlend: Easy Model Blending using Pseudo-Bayesian Model Averaging, Stacking and Hierarchical Stacking in Python

arXiv.org Machine LearningApr-30-2024

Averaging predictions from multiple competing inferential models frequently outperforms predictions from any single model, providing that models are optimally weighted to maximize predictive performance. This is particularly the case in so-called $\mathcal{M}$-open settings where the true model is not in the set of candidate models, and may be neither mathematically reifiable nor known precisely. This practice of model averaging has a rich history in statistics and machine learning, and there are currently a number of methods to estimate the weights for constructing model-averaged predictive distributions. Nonetheless, there are few existing software packages that can estimate model weights from the full variety of methods available, and none that blend model predictions into a coherent predictive distribution according to the estimated weights. In this paper, we introduce the BayesBlend Python package, which provides a user-friendly programming interface to estimate weights and blend multiple (Bayesian) models' predictive distributions. BayesBlend implements pseudo-Bayesian model averaging, stacking and, uniquely, hierarchical Bayesian stacking to estimate model weights. We demonstrate the usage of BayesBlend with examples of insurance loss modeling.

accident year, bayesblend, candidate model, (14 more...)

arXiv.org Machine Learning

2405.00158

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Li, Conglong, Zhang, Minjia, He, Yuxiong

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models

arXiv.org Artificial IntelligenceOct-16-2022

Recent works have demonstrated great success in pre-training large-scale autoregressive language models on massive GPUs. To reduce the wall-clock training time, a common practice is to increase the batch size and learning rate. However, such practice is often brittle and leads to a so-called stability-efficiency dilemma: increasing the batch sizes and learning rates leads to better training efficiency but can also result in training instability, leading to poor generalization accuracy or failed runs. To better understand this phenomenon, we conduct an in-depth analysis on large-scale pre-training experiments replicating the GPT-2 model. We find that there is a strong correlation between training instability and extreme values of gradient variance, and that samples with long sequence lengths contribute to these extreme gradient variance values, especially at the beginning of the training, indicating that long sequence length can be a main source of training instability. Based on the analysis, we present a Sequence Length Warmup method that aims to solve the training stability-efficiency dilemma. Experiments replicating GPT-2 models show that our approach enables stable training with 8x larger batch size and 4x larger learning rate, whereas the baseline approach struggles with training instability. To achieve the same or better zero-shot evaluation results, our method reduces the required number of training tokens and wall clock time by up to 2.2x and 3.7x, respectively. Experiments replicating GPT-3 model (125M) show that our approach enables stable training with 8x larger batch size and 40x larger learning rate, and retains 99% of the zero-shot accuracy on 11 tasks using 10x less data and 17x less time compared to the original GPT-3 training recipe, while the baseline diverges under the same settings and only retain 95% of accuracy under lower learning rate.

large language model, machine learning, natural language, (20 more...)

2108.06084

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-13-2022

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Frei, Spencer, Vardi, Gal, Bartlett, Peter L., Srebro, Nathan, Hu, Wei

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an $\ell_2$-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training. We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.

artificial intelligence, gradient descent, machine learning, (17 more...)

2210.07082

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

#artificialintelligenceOct-8-2022, 14:41:01 GMT

1 Growth Stock Down 85% to Buy Right Now

It's no secret the technology sector of the stock market has been crushed this year. The Nasdaq 100 index, a widely followed benchmark for high-growth tech companies, has declined by 29% in 2022 so far. But many individual stocks have been hit even harder, particularly those focused on serving consumers because it makes them more vulnerable to the broader economic slowdown. Interest rates have been rising because inflation recently topped a 40-year high, and that's placing a stranglehold on people's spending power. Still, some consumer-centric companies have managed to maintain rapid growth rates in this difficult period.

all-time high, customer, lemonade, (8 more...)

Country:

North America > United States (0.05)
Europe > United Kingdom (0.05)
Europe > Netherlands (0.05)
(2 more...)

Industry: Banking & Finance > Insurance (1.00)

Technology: Information Technology > Artificial Intelligence (0.51)

#artificialintelligenceSep-16-2022, 19:25:53 GMT

2 Artificial-Intelligence Growth Stocks Shaping the Future of Technology

Innovative technologies have regularly reshaped the world. In the last few decades, inventions like the personal computer, the internet, and the smartphone have dramatically enhanced human productivity, while creating tremendous wealth in the process. And artificial intelligence (AI) promises to be the next transformative technology. In fact, research company McKinsey estimates that AI could boost global economic output by 16% (or $13 trillion) between 2018 and 2030. Companies like Nvidia (NVDA 1.74%) and Lemonade (LMND -6.03%) could be major beneficiaries of that trend because both are using AI to shape the future of technology.

artificial-intelligence growth stock shaping, lemonade, nvidia, (11 more...)

Genre: Overview > Innovation (0.56)

Industry: Banking & Finance > Insurance (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (0.31)

#artificialintelligenceAug-25-2022, 14:00:26 GMT

GuideOne Selects Betterview

Betterview, an InsurTech provider of actionable property intelligence to property and casualty (P&C) insurance companies, is pleased to announce that GuideOne Insurance Company (GuideOne) has selected to implement the Betterview Property Intelligence & Risk Management Platform. GuideOne, a leading provider of coverage for religious organizations, educational institutions, and nonprofit and human services organizations across all 50 states for over 75 years, needed a solution to increase underwriting efficiency and strengthen risk management processes for commercial properties. Structurally, religious organizations often have complex roofs and pose a larger threat for insurers because the buildings are vacant most of the week. This puts such organizations and facilities at a greater risk for large losses when not managed, maintained, or monitored sufficiently. "GuideOne works differently than other insurers," said Betterview co-founder and chief operations officer David Tobias.

betterview, guideone select betterview, intelligence & risk management platform, (5 more...)

Industry:

Banking & Finance > Insurance (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.34)

Technology: Information Technology > Artificial Intelligence > Robots (0.40)

#artificialintelligenceMay-23-2022, 17:05:41 GMT

Gradient AI Joins Guidewire Insurtech Vanguards Program

BOSTON--(BUSINESS WIRE)--Gradient AI, a leading enterprise software provider of artificial intelligence (AI) solutions for the insurance industry, announced that the company has joined Guidewire's Insurtech Vanguards program, a new initiative led by property and casualty (P&C) cloud platform provider Guidewire (NYSE: GWRE), to help insurers learn about the newest insurtechs and how to best leverage them. "Guidewire is one of the most recognized platform providers in the insurance industry today and we are proud to be working with the company," said Stan Smith, founder and CEO, Gradient AI. "As a part of the Guidewire Insurtech Vanguards program we look forward to helping insurers improve underwriting and claim processes with our AI-power insurance solutions." Insurtech Vanguards is a community of select startups and technology providers that are bringing novel solutions to the P&C industry. As part of the program, Guidewire provides strategic guidance to and advocates for the participating insurtechs, while connecting them with Guidewire's P&C customers. "Gradient AI is an effective, innovative, and proven insurance solution providing insurers the intelligence needed to significantly improve their efficiency and profitability in claims and underwriting operations," said Laura Drabik, chief evangelist, Guidewire.

gradient ai, insurer, join guidewire insurtech vanguard program, (10 more...)

Genre: Press Release (1.00)

Industry: Banking & Finance > Insurance (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

#artificialintelligenceDec-10-2021, 16:24:51 GMT

How is AI improving profitability?

Read more: Why understanding AI is "non-negotiable" for insurers "The first step in the process is to identify opportunities of similarly situated or homogeneous risks. Once we know what product we're underwriting and what the target risks look like, we take those characteristics, have our digital partners scan the web for publicly available information, and use their AI engine to identify any targets by geographic region that meet those risk characteristics," he explained. This information is then passed on to Fortegra's underwriting agency to assess opportunities for growth, improving the overall efficiency of the sales process. Now data can play a critical role and be applied to an AI engine to ensure the hierarchy of risk characteristics are properly set. Kahlbaugh highlighted that this strategy allows underwriters and agents to be far more productive, and the application of technology to improve both relationships ultimately translates to better risks, lower loss ratios and higher commission profits.

application, kahlbaugh, profitability, (9 more...)

Industry: Banking & Finance > Insurance (0.97)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.40)